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, We propose efficient nonparametric statistics to compare medical imaging modalities in multi-reader multi-test 
; data and to compare markers in longitudinal ROC data. The proposed methods are based on the weighted area 
under the ROC curve which includes the area under the curve and the partial area under the curve as special cases. 
The methods maximize the local power for detecting the difference between imaging modalities. The asymptotic 
results of the proposed methods are developed under a complex correlation structure. Our simulation studies show 
p | . that the proposed statistics result in much better powers than existing statistics. We applied the proposed statistics 
, to an endometriosis diagnosis study. Copyright © 2010 John Wiley & Sons, Ltd. 
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OO ! 1. Introduction 

' In medical imaging studies, one is concerned about whether a newly developed imaging modality is more accurate than 
CN ■ traditional modalities to correctly discriminate a subject with abnormal lesions from a subject without such lesions. 
T 7"! ■ Imaging modalities are considered as an example of diagnostic markers, which are used to distinguish a subject with 
. £h ' a particular condition ("the diseased") from a subject without the condition ("the non-diseased"). For diagnostic markers 
^ , that generate binary test results, their accuracy can be summarized in terms of sensitivity (probability of identifying a 
, diseased subject when the disease truly exists) and specificity (probability of correctly ruling out a non-diseased subject 
when the disease is truly absent). For diagnostic markers that generate discrete or continuous test results, the receiver 
operating characteristic (ROC) curve is a standard statistical tool to describe and compare the accuracy of markers [1]. 
The ROC curve combines all possible pairs of sensitivities and 1— specificities from different decision thresholds and thus 
describes the accuracy of markers apart from decision thresholds. 

For correlated results from two diagnostic markers, parametric and nonparametric methods have been proposed to 
compare ROC summary measures. Parametric methods for the area under the curve (AUC) assume distributions (e.g. 
negative exponential, normal, lognormal, gamma) on marker measurements [2, 3]. These methods may not perform 
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well if the parametric assumptions are invalid. The semiparametric ROC estimation based on the logistic regression is 
proposed by [4]. As an alternative, nonparametric methods do not require distribution assumptions and are robust to model 
misidentification. Nonparametric methods to estimate and compare two AUCs have been proposed by [5], [6], and others. 
These methods are based on results for U-statistics because an empirical AUC statistic is essentially a Wilcoxon rank 
sum statistic [7]. However, if two ROC curves intersect, their AUCs may be equal and do not provide valid information 
for the comparison. Moreover, summarizing the entire ROC curve may include irrelevant information about the marker's 
accuracy when one is only interested in some range of specificities. For example, acceptable specificities are high for 
early cancer detection tests. The partial area under the curve (pAUC), which summarizes part of the ROC curve in the 
range of desired specificities, may be a better alternative. Nonparametric methods to compare pAUCs are proposed by 
[8]. Utilizing the pAUCs is particularly important in comparing markers which are developed to screen a large 
population for certain diseases, for example, breast cancer [9]. A lower specificity for a large population leads to 
many more falsely classified non-diseased subjects who may have to undergo a more invasive test subsequently. It 
is thus desired to compare screening markers at a higher range of specificities. 

In this paper we propose efficient nonparametric ROC statistics to analyze multi-reader multi-test ROC data 
and to nonparametrically summarize correlated longitudinal ROC data. The proposed method not only includes 
many nonparametric ROC summary measures as special cases, but also maximizes the local power for detecting the 
difference between markers. The rest of the article is organized as follows. In Section 2 we introduce the new statistics 
for multi-reader multi-test ROC data and longitudinal ROC data, and discuss the equivalence between our statistics and 
the generalized Wilcoxon statistics under specific assumptions. Section 3 gives the variance expressions for the proposed 
statistics. Section 4 reports simulation results to illustrate the small sample performance of the proposed ROC statistics 
and their theoretical variances. Section 5 applies the proposed method to a real example on the diagnosis of endometriosis. 
Section 6 gives some discusion. 



2. Methods 

2. 1. Definition of nonparametric ROC summary statistics 

We first define some notations. Suppose test result Xa p of marker £ is from the pth abnormal location in the diseased 
subject i, where £ = 1, L, p = 0, 1, m«, and i = 1, ...M. Test result Yij q of marker £ is from the gth normal location 
in the non-diseased subject j, where £ = 1, L,q = 0, 1, ntj, and j = 1, ...J. Here the total number of subjects is N = 
M + J. The joint pairwise cumulative function of (Xi lipi , Xi 2ip2 ) is taken to be Sd,i 1 ,i 2 {xi,X2), Pi,P2 = 1, — , fnu, with 
marginal survival functions X lip ~ S D> t{x). Similarly we define {Y iljqi , Y t2jq2 ) ~ S D ei i 2 (yi,y 2 ), 91,92 = 1, — ,n«, 
with marginal survival functions Ytj q ~ e (y). The ROC curve for the £th marker is then given by ROCg(u) = 
Sn,e (Sfi\( u )\ where the false positive rate (FPR) u is in [0, 1]. The resulting £\h weighted area under the curve (wAUC) 
is 

fi* = J S D ,i{s^ e (u)jdW(u), (1) 

with a probability measure W(u) defined on u, for u e [0, 1]. Included in this class of accuracy measures are AUC, pAUC 
between FPRs u\ and 112, and the sensitivity at a given level of FPR u . W{u) can also be defined as certain distribution 
functions, such as the beta cdf, to assign varying weight to the specificity. The detailed discussion is in [10]. 

By substituting the functions Sdj and t with their respective empirical function Sdj and Sq £ , the nonparametric 
wAUC estimator is given by = J Q SD,i{S^ t {u)}dW(u). The empirical survival functions Sd,i and Sq j are defined 
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by 

^ M ma 

sdA x ) = — E E J ( Xa p > a; )' 

Z^i=l m U i=l p=l 

J n ej 

SdAx) = —j — E E 7 ( y «« > x )- w 

Denote fi = (Oi , Sl 2 , — , ^l)- By substituting Sd,£ and f in Equation (1), the nonparametric estimator of fl is given 

byn = (ni J n ai ...,n i ). 

We define W(u) = u for < u < 1 to obtain the nonparametric AUC estimator for the £th marker as follows 

<W = — — E E E E > (?) 

2j<=i m ^ 2-^=1 i=i P =i 3 -=i g =i 

The AUC statistic in (3) takes the form of the Wilcoxon rank-sum statistic. It essentially compares the measurements 
of abnormal locations with those of normal locations. To calculate this statistic, we obtain every possible pair of 
measurements from an abnormal location and a normal location. We assign 1 if the abnormal location's measurement 
is larger than the normal location in the pair, and otherwise. &A,e is then calculated by averaging the 1 's and O's over all 
possible pairs. Since the location within each subject is viewed as the unit of sampling, the inference based on the regular 
Wilcoxon rank-sum statistic is not valid here. 

When W(u) = (u — u\)/{u2 — u{) for < u\ < u < u<i < 1, fie empirically estimates the partial AUC (pAUC), and 
its explicit form is given by 

— — EEEE 7 ^ > Y ^\ y ^ € (^m^k)). (4) 

Z^i=l m ll l^j=l n £j i=l p=l j=l q=l 

The pAUC statistic in (4) uses all measurements from the abnormal locations. Since the pAUC is specified to be in the 
range of (m, u 2 ), only measurements from the normal locations which fall in (S , ^ 1 ^(u 2 ), Sq Jui)) are used in (4). That 
is, we sort all measurements from the normal locations from the smallest to the largest, and obtain the order statistics 
Yt n „ iw „ i and Y u , „ -,w „ i, where \x\ denotes the smallest integer greater than or equal to x. We then 
calculate the Wilcoxon rank-sum like statistic by comparing all X's with Y's which are between Yj^_ U2 j ^ and 

Y[(i- Ul )J2 J i n nY ^ e P^UC statistic is useful in disease screening when a high FPR would lead to a large number of 
falsely diagnosed subjects. It is desirable to evaluate and compare the marker accuracy at the low FPRs rather than the 
entire range of FPRs. When we are interested in the sensitivity of the Ith marker at a particular threshold, say c, we can 
specify the probability measure to be a point mass at u = Sjj e (c). The estimator Q# then becomes 

j M rati 

— E E i( ^ Xh p > %-«o) e/=i (5) 

2^i=l m h i=l p=l 

The estimator in (5) is obtained by comparing all X's with Y^ 1 _ Uq - ) n( .y 

In the following sections, we propose efficient nonparametric methods based on the nonparametric estimator of fl to 
evaluate and compare multiple markers in multi-reader multi-test ROC Data and longitudinal ROC data. 

2.2. Multi-reader multi-test ROC data 

One type of complex marker data arise frequently in medical imaging studies when radiological images of a patient 
are evaluated by several radiologists. [11] consider a mixed-effect ANOVA model while allowing for correlation 
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among AUC estimators. Their model requires a specific covariance structure among the AUCs. [12] propose a 
pseudo-generalized estimating equation method and derive large sample theory for the estimators. Their method 
remains valid under the working-independence assumption. 

In a multi-reader multi-test ROC study, suppose the radiologist r, r = 1, R, rates images for M diseased subjects 
and J non-diseased subjects from L imaging devices. A radiologist can give one or more ratings to suspicious locations 
in each subject, that is, m«, nij > 1. We consider L = 2. Denote Oi, Qr as wAUCs from R readers for modality 1, 
Or+i, ...,Cl2R as wAUCs from R readers for modality 2. Common nonparametric approaches for comparing imaging 
modalities take the difference Q r — fln +r between two devices for reader r, and then average these differences over all 
reader [13]. We can see that such methods are a special case of the linear combination of the weighted AUC statistics 
for reader-modality combinations. Rather than the simple average of all Q r — Sli? +r 's, we propose to use the following 
weighted linear combination to possibly achieve a higher power to compare markers 

R R 

^ m = C^2'W r )~ 1 '^2[w r (ilr - ClR+r)], (6) 
r— 1 r— 1 

with positive and bounded weights W = (w±,W2, —Wr)'. The parameter A m can be empirically estimated by 

R R 

A m = (^uv) -1 *^2[w r (£l r -f2jM_ r )], 

r— 1 r— 1 

which compares two modalities with multiple readers. 

Various choices of weights exist in the ROC literature. W may not depend on the data. For instance, if all readers are 
assumed to be homogeneous with regard to their accuracy of rating images, an equal weight w r = 1/R can be assigned 
to reader r, r = 1, R. Then with m« = ngj = 1 and W(u) = 1 at < u < 1, A m becomes the AUC statistic in [13]. 
When one has to estimate W from the data, the consistency of estimated weights W in probability is required for the 
derivation. For instance, a set of optimal weights is introduced by [14] and further developed by [15], who argues that 
when readers' experience vary greatly, using equal weights may yield a biased AUC estimate. Let the R x R covariance 
matrix of estimated AUC differences, (fii — Hr+i, CIr — £1 2 _r)', be S^, and its consistent estimator S^. They then 
choose W = E^ 1 ! to obtain a consistent estimator for the AUC difference, where 1 is a i?-dimensional vector of one's. 
[14] and [15] show that this set of weights are optimal since they maximize the local power to detect the AUC difference 
between imaging modalities. It is clear that by combining these weights with mu = ri£j = 1 and W{u) = 1 at < u < 1, 
A m becomes [15]'s statistic. To properly calculate the weights for the proposed statistic, we need to obtain the covariance 
matrix S of f2 = (Qi, &2r)'- Since in practice S is unknown, its consistent estimator S can be obtained using the 
explicit expression (A.l) derived in the Appendix. Since S and is related via 

X A = £A 

where the rth column of the 2R x R matrix A has l's at rth and (i? + r)th rows and at other rows, the estimated weights 
are given by 

W = fT^l. (7) 

2.3. Longitudinal biomarker data 

Another example of complex marker data comes from longitudinal studies when marker measurements are taken at several 
times during the studies. Most methodology for longitudinal ROC data rely on appropriate assumptions on the distributions 
of marker measurements [16]. In longitudinal ROC data, suppose L markers are measured on M diseased patients and J 
non-diseased patients at times t\, i2, tjc. 
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Suppose each subject is repeatedly measured for every marker at each time. Let Xu P k denote the test result of marker I 
in the pth repetition on the diseased subject i at time tk, where 1 = 1, L, p = 1, muk, i = 1, ...M, and fc = 1, if. 
Let Y^qfc denote test result of Mi marker on the pth repetition in the non-diseased subject j at time tk, where 1=1, L, 
g = 1, ...,ri£jk, j = 1, ---J, and k = 1, The nonparametric wAUC estimator for the Mi marker is then given by 

Qc = f Sd,£{Sq\ (u)}dW(u), where and Sf, t are defined by 

M K m tik 



SdA*) = M I] I] I] 7 ( X «P* > X )' 

2^i=i z^fc=i TO «fe i=i fc=i p=i 



and = — j K — Yl 12 12 1{ y^i k > x )- (8) 

Ej=l Efc=l n ij k j=l k=l 9=1 

By defining W(u) accordingly in the wAUC estimator, we obtain the nonparametric AUC estimator for the £th marker: 

M K muk J K nejk 

V M s-^K ^7? 12 12 12 12 12 12 I ( X fvki > Y (jqk 2 ), 

2^i=l Z^fc=l m ^ fc 2^7=1 Z^fc=l i=l fe 1= l p=l j = l fc 2 =l 9 =1 

the partial AUC estimator: 

Ei=l Efe 1= l E™=f Ej=l Efc 2 = l Eg=l I ( X Hpkl > Yljqk 2 \Yljqk 2 £ {S^(u 2 ), S^(«l)) 

Z^=i Z^=i TO «fe 2^=i 2^fc=i n O"fe 
and the sensitivity estimator at the FPR of uq, 

j M K muk 

sr^M s-^K XI XI XI 7 ( X «Pfe > %-Uo) E/=i ELi 

Z^i=l Z^fc=l m iik i—i k=l p=l 

We define h to be a real- valued function of f2. Here the function h is defined on M. L , and has continuous partial 
derivatives of order 2. Let the ROC summary measure be = h(tt). Its empirical estimator is given by 



A h = h(n) = h^ S D s{S B \(u)}dW(u) 7 ... 7 J^ S D>L {S^ L (u)}dW(u)\ 



(9) 



The statistic above can be used to compare two longitudinal markers when h is a linear contrast. Ah also includes a 
broad range of ROC statistics. It is the weighted AUC statistic in [17] and later in [10] for evaluating and comparing 
markers. When W{u) ~lat0<w<l and h is a linear function, is the generalized AUC statistic in [13]. When 
W(u) — I at < u < 1, A/j is the AUC statistic in [18], assuming no correlation between X and Y, which allows 
for multiple observations per patient from each marker. When W(u) = (u — a)/(b — a) for < a < u < b < 1 and 
h(fli,Q 2 ) = fii — ^2, Ah is the pAUC statistic in [8] for comparing two markers. 

When there are two longitudinal markers in the study, the optimal combination for comparing the two markers 
can be obtained using the similar steps in the aforementioned multi-reader multi-test studies. Suppose L = 2. 
Let Qe t k be the wAUC of marker 1,1=1, 2, at time tk and fi^ be its nonparametric estimator given by f^ fc = 
Jo SD,£,k{S]j\ k (u)}dW(u), where S Dt e tk and S B e k are denned by 

S D ,e,k(x) = —it 12 12 > x )> and S BAk (x) = —j 12 / ( Y «i9fc > x )- (10) 

Zjj=1 m tik i= i p= i l^j = \ n tjk J = 1 q= i 

Note that the estimation of Vt t ik is based on every individual time point. One can take difference of the wAUCs of 
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two markers, and simply average these differences over all time points. We may also use the following weighted 
linear combination to possibly achieve a higher power to compare markers 

K K 

A e = (^ Wfe )- 1 ^K.(^ 1 , fe -^ 2 , fe )], (11) 

k=l k=l 

with positive and bounded weights W = {w\, W2, ■■■wk)' '• The parameter A e can be empirically estimated by 

K K 
A t = (^Wfc)" 1 ^[wfe(^l,fc-^2,fe)]- 

fe=l fc=l 

Similarly as in the previous section, the 2K x 2K covariance matrix S of fl = (fii,*,, 0,2k)' can be estimated can 
be obtained using the explicit expression in (A.l). Thus the estimated weights are given by the same expression as 
(7). 



3. Asymptotic variance expressions of the proposed statistics 

In this section we derive the asymptotic variances for the proposed statistics in the multi-reader multi-test data and the 
longitudinal data. We first show the explicit variance expressions for A m , and then show the variance expression for the 
more general statistic Ah in (9) for the longitudinal data. 

The numbers of abnormal locations within a diseased subject may differ, and so are the numbers of normal locations 
within a non-diseased subject. Denote rhu = YlfLi an d nt — S/=i n ij- Assume that Sdj an d Sq ( have continuous 
and positive derivatives, S' D e , and S'^ £ . In Appendix we show that the proposed statistic, A m , for the multi-reader multi- 
test ROC data is asymptotically normal when sample sizes are large. The variance of A m has the following expression 
when sample sizes get large: 

var(A m ) =v x + vy, (12) 



with 



v x - 



1 M ft 

— - =5 » E D%^("1) W2)+1 ( / / [Sn^MS^S^S^M 



S 



("^SdMS^ (*)}] dW(s)dW(t)) , 



and 



*r= — . l.n ~ 2 E X>^(-i) J( ' lA,+1 ( // nMnM 

x [SDA/A^eS^^iM - st]dW{s)dW{t)), 

where I(£i, £2) = 1, if \£-2 — i\\ < R, and 0, otherwise, and 

n{u) = S' DJ {S^ e (u)}/S' B e {S^ ( (u)}, for £ = 1, L. 

The marginal and joint survivor functions can also be empirically estimated. 

Denote mt = J2tLi J2k=i m tik, and nt = Ylj=i Ylk=i n tjk- we show in Appendix that the proposed statistic, Ah in 
(9) for the longitudinal data is also asymptotically normal, and the variance of Ah takes on the following form when 
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sample sizes are large, 

var(A h ) = v x + v Y , (13) 



where 

M 



dh dh 

vx= y 

Mm ei mi 2 



and 

J 



where 



»•<(«) = ^{5fli(«)}/Ss^{^ /(«)}. for « = i,..-,£ 



The empirical or other type of smoothed estimators for the marginal and joint survivor functions Sd,£, Sf, t , 
Sd,Ii,13 (%i i x 2), and Sq tl ^ (yi , y 2 ) can be used to estimate vx and vy . In the simulations and the example, we used the 
empirical estimators. That is, we estimate Sd,i and Sq 1 using the expressions in (8). And we estimate SD,e 1 ,e 2 ( x i, x 2), 
and S B j, 1 j 2 (yi,y 2 ) as follows: 

_^ M rn tl i mi 2 i K K 

§D,t 1 ,e a (xi,x2) = — m — y^ y^ y^ i{x tlipikl > xi,x l2iP2k2 > x 2 ), 

Z^i=l TO ft i=l pi=l p 2 =l fel=l fe 2 =l 

^ J n tli ne 2j K K 

SB 1 e 1 ,e 2 (yuy2) = ■== ~Y Y Y Y Y I ( Y ^oqik 1 > yi,Y t2jq2k2 > y 2 ). 

Aj = l n tj 3=1 91 = 1 g 2 = l fe 1= l fc 2 = l 

Thus, when f2's are AUCs, «x is given by 

ll l ' 2 1<£iM<2R 1=1 ll l °- 

E[I(X (lWlkl > Yi ljpikl )}E[I(Xe 2ipikl > Yt ajpikl )]\, 



and vy is given by 

1 -i - x - dh dh 

: Mn tl n t3 V ^ ntl]ni23 dni 1 dn^ 



y~! ^ "/ : , "/ 7^77— ( /; ^ ' - v ' ; > Y eupiki)I(Xe 2ipikl > Y tljpikl )\ 

l<l u l 2 <2R 3=1 



E[I(Xf lipikl > Yf ljpikl )}E[I(X£ 2ipikl > Ye 2]Plkl )]^. 



4. Simulation studies 



We report simulation studies to evaluate the finite sample property of the proposed statistics. We simulated both multi- 
reader multi-test ROC data and longitudinal data. In multi-reader multi-test data, we considered the finite sample 
performance of the variance expression. More importantly, we compared the simulated powers of the equal weight and the 
optimal weight introduced in Section 2.2. We expect that the optimal weight results in better power than the equal weight. 
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In longitudinal data we considered the general setting where each subject is diagnosed repeatedly at each time point and 
the number of repeated measures varies from subject to subject. 

4. 1. Multi-reader multi-test data 

In the first simulation study we investigated the finite sample accuracy of the variance expression for multireader multitest 
data. We let mu = nej = 1, R = 3, and L = 2. We simulated 1000 datasets under multivariate normal and lognormal 
distributions: 

1. X ~ JV(/i x ,Sx) and Y ~ iV(/Zy,Ey), where fi x = (1, 1), fi Y = (0, ...,0) and Ex = Sy is the variance- 
covariance matrix with diagonal elements (1, 1.5, 2, 1, 1.5, 2) and correlation coefficient, p; 

2. X ~ LogNormal(fi x , Ex) ar >d Y ~ LogNormal(fi Y ? Ey). 

From simulated data we used the proposed statistic in Section 2.2, A m = 2\T r =i(^ r — &R+r)/R to estimate the AUC 
by defining the weight function W(u) = 1, for < u < 1), and the pAUC by defining W(u) = 1, for < u < 0.6; 
otherwise. A 95% confidence interval for A m was obtained using the variance expression derived in (13). Table 1 shows 
biases, square root of mean squared errors (RMSE), and simulated coverage of confidence intervals. It is clear from the 
table that coverage levels are close to the nominal level, and biases for comparing AUCs or pAUCs are close to zero. This 
shows good performance of our estimator and associated asymptotic results. 

In the second simulation study we compared the performance of the proposed method with the parametric 
method by [3] and the semiparametric logistic regression method by [4] with regard to estimating the AUC. We 
used the same setting as the first simulation study except changing fi x to (1,1,1, 1.5, 2, 2. 5). The biases and RMSEs 
from the three methods are shown in Table 2. The results indicate that the proposed method and the semiparametric 
method perform much better than the parametric method when the distribution assumptions are violated. They 
also indicate that the semiparametric method performs as well as the proposed method. This is not surprising as 
can be seen from the description of the semiparametric method in Section 2 of [4]. The logistic regression fits the 
regression parameters based on the following equation: 

logit(D = l)=f3 + f3iZ, 

where D is the disease status (with 1 being the diseased, and being the non-diseased), f3 and pi are regression 
parameters, and Z is the test result. After the regression parameter estimators, f3 and /3i, are obtained, the 
empirical ROC curve is estimated based on the new score, Z = /3 + /3i Z. Since the ROC curve is invariant to 
monotonic transformation, the empirical ROC curve based on the new score remains the same as the empirical 
ROC curve from the original test results. 

In the third simulation study we compared the simulated powers using the optimal weight versus the equal weight. We 
again let mu = ngj = 1, R = 3, and L = 2. We simulated 1000 datasets under multivariate normal distributions: X ~ 
N(fi x ,Jjx) an d Y ~ AT(/*y,Sy), where /x x = (2, 1, 1), fi Y = (0, •••,0) and Ex = Ey is the variance-covariance 
matrix with diagonal elements (1, 1.5, 2, 2, 3, 2) and correlation coefficient, p. We selected m~n'm (50,100), and p in 
(—0.1, 0.2, 0.5). For each simulated data, we estimated the weighted differences in (2.2): 

3 3 
r—1 r—1 

with both equal weights (w r = 1/3) and the optimal weights given in (7). The AUC was estimated by defining the weight 
function W(u) = 1, for < u < 1), and the pAUC was estimated by defining W(u) = 1, for < u < 0.6; otherwise. 
The simulated power was then calculated as the number of rejections out of 1000 simulated datasets. Table 3 shows the 
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simulated powers for the comparison of AUCs and pAUCs. It is clear that the optimal weights always result in much larger 
powers than the equal weights. 

4. 2. Longitudinal biomarker data 

In this simulation study we generated multivariate log-normal correlated biomarker data. We generated data by taking 
exponential of multivariate normal data Xi ~ N(fi x i , and Yj ~ N(0, Sy^ ), where fi x i = (2, 2, 1, 1), and 
Sx,i and £yj are variance-covariance matrices. We let L = 2, K = 3, M = J = (50,200). To allow various cluster 
sizes, we let muk = 2 for the first half of diseased subjects, and muk = 4 for the other half. For non-diseased subjects, 
let nijk = 5 for the first half, and ngjk = 3 for the other half. We chose Hx,i = (1 — p)Mi + pl;l4, where Mi is the 
LKmuk x LKmuk identity matrix and l t is the LKmuk X 1 matrix with all elements 1. Similar setting was applied to 
define Sy,j. Here p gives within-subject correlation. We let p = 0.4 for the diseased and p = 0.3 for the non-diseased. We 
simulated 1000 datasets for each sample size, and obtained the estimate of AUC difference between two biomarkers, A;, 
and its variance. Table 4 shows biases, square root of mean squared errors (RMSE), and simulated coverage of confidence 
intervals. This again shows good performance of our estimator for correlated biomarker data. 



5. An example in the diagnosis of endometriosis 

The proposed nonparametric ROC summary statistics are applied in this section to data from a study on endometriosis 
diagnosis. Endometriosis is a gynecological medical condition in which endometrial-like cells appear and flourish in areas 
outside the uterine cavity and is typically seen in women at their reproductive ages. It has been estimated that endometriosis 
occurs in roughly 5%— 10% of women. Despite its relatively high prevalence, substantive and methodological challenges 
exist, including diagnostic proficiency. The Physician Reliability Study, an add-on to the Endometriosis: Natural History, 
Diagnosis and Outcome (ENDO) Study [19], addressed this issue by investigating whether sequentially added clinical 
information of a subject can aid in more accurately diagnosing the disease of endometriosis. Detailed study designs of 
ENDO and PRS can be found in the aforementioned references. For demonstration purpose in this paper, we used review 
results of 4 physicians (reviewers) in PRS on 150 participants. All 150 participants had recorded operative digital images 
of their pelvic organs and descriptive drawings and notes, both from surgeons who conducted the laparoscopics on these 
women in ENDO study. The reviewers conducted their reviewing and diagnosis under two modalities. Modality one 
corresponds to the setting where the reviewers are presented with participants' digital video/images while modality two 
corresponds to the setting where both digital video/images and surgeon's reports (drawings and notes) are presented. For 
each participant under each modality, the reviewer answered a series questions on what they observe from the clinical 
information. These answered were later used to derive the rASRM scores [20] which we used as the diagnostic outcomes 
in this paper. The visualized diagnosis from the original ENDO study of these participants were used as the gold standard. 

For the first modality, the estimated AUCs are (0.71, 0.75, 0.63, 0.76) for the four reviewers; the corresponding numbers 
are (0.83,0.85,0.75,0.87) for the second modality. With equal weights w r = 1/4, r = 1, ...,4, the A-statistic is A m = 
—0.1145, and its variance estimate is 0.0007475. We used (7) to obtain the optimal weights (u>i, u> 2 , u>3, W4)=(298.08, 
401.16 , 176.88, 560.48). Using these weights, the A-statistic is given by A m = —0.1115, and its variance estimate is 
0.0006961. This indicates that the A-statistic is more precisely estimated by using the optimal weights. The two-sided 
p-value using the optimal weights is 2.36 x 10 5 , which is slightly smaller than the p-value 2.82 x 10 5 using equal 
weights. The two-sided p- values based on both sets of weights are both close to zero, which indicates that these physicians 
are able to give more precise diagnosis on endometriosis by reviewing both digital images and surgeons' descriptive 
reports. 
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6. Discussion 

The proposed methods in the paper are nonparametric and can be applied to evaluate and compare diagnostic markers 
in the multireader multitest data and the longitudinal data. As illustrated in the simulation studies and the example, the 
proposed weighted method in the multireader multitest data tends to have a larger power than the existing methods. We 
also conducted simulation studies to investigate the finite sample performance of the proposed method in the longitudinal 
data setting. More complex correlated data in which both normal and abnormal locations may occur in the same subject 
have been considered in [21] and [22]. How to extend the proposed statistics to such a data setting is a future research 
topic. 

As pointed out by a reviewer, the proposed method is based on the empirical distribution estimators, and 
may not allow more complicated dependencies of observations in longitudinal data. For example, in the case 
of autoregressive dependencies, empirical estimators could not converge to target probabilities, especially when 
autoregression coefficients are greater than one. More research is merited to extend the proposed method in this 
direction. 
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Appendix: Derivation of variance expression of A h 



Assume that Sd,i and S D l have continuous and positive derivatives, S' D t , and SL Suppose that M/mt — > an, 
M/ne -> Pi, M/J -> A, Y^tLi m eii mt 2l /M 2 -> r^ uta , and Ylj=i nt^ni^/M 2 ->■ vl u t 2 , as M , J -> °°- Assume that 
at, Pi, r\f £ 2 and 77^ t are finite numbers. In addition, assume that the function h has continuous partial derivatives of 
order 2 at each point of an open set (SI — e, tt + e), for e > 0. Let 

n(u) = S' D/ {S^(u)}/S' B £ {S^(u)}, for I = 1, ...,L, 

where S' D e and S'^ g are the first derivatives of Sd,i and Sjj t , respectively. 

The asymptotic normality of $7 is derived using results from [18], which gives that for markers 1,...,L, 



( ROC^u)- ROCx(u) \ 
RDC 2 {u)- ROC 2 {u) 



\ ROC L {u) - ROC L {u) j 



— > 



\ V^iASDAS D ' L (u)}]-Vfor L (u)lJ 2tL (u) J 



where Ui.£ and U2/ are limiting Gaussian processes. Therefore, after some calculation, it follows that 

VM(tl -SI) A N l (0, S = Ei + S 2 ), 
where the {£i,£ 2 } element in Si is given by 



(A.l) 



Oii x on 2 r\ lrM 



[So,t u t ASnV (*)> - S D,i AS* 1 . (s)}S D ,, 2 {SZ l . (t)})dW(s)dW(t), 



0J0 



(A.2) 
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and the {£1,(2} element in E 2 is 

WhM lita [ [ nAsMmSBA^B. eS s )> S B. - st]dW{s)dW{t). {A3) 
Jo Jo 

The Taylor expansion of A at ft gives 

A h -A h A(n- n)'vft(n), (A.4) 

where V/i(f2) is the gradient of /i evaluated at ft. Since the asymptotic variance of the right hand side in (A.4) is given by 

V/i(f2)W(fi - fi)V/i(fi). 



It follows that 



dh 2 



(A.5) 



Using the covariance structures in (A. 2) and (A.3) in (A.5), we can then obtain the asymptotic normality of A^ by 
combining (A.l) with the Cramer- Wold device [23]. 

Table 1. Bias, RMSE and coverage for simulated multi-reader multi-test data 









AUC 






pAUC 




p 


M(J) 


Bias (in %) 


RMSE 


Coverage 


Bias (in %) 


RMSE 


Coverage 


Norm -0.1 


50 


8.01E-02 


0.0359 


91.94% 


3.17E-02 


0.0304 


92.52% 




100 


3.43E-02 


0.0483 


89.47% 


7.99E-02 


0.0404 


91.99% 




200 


-1.93E-01 


0.0481 


92.18% 


-1.00E-01 


0.0396 


94.40% 


0.2 


50 


-8.21E-02 


0.0258 


91.66% 


-1.01E-01 


0.0217 


93.70% 




100 


1.31E-01 


0.0348 


89.87% 


1.03E-01 


0.0296 


91.20% 




200 


-1.32E-01 


0.0343 


92.50% 


-1.21E-01 


0.0297 


92.60% 


0.5 


50 


-6.38E-02 


0.0175 


94.12% 


-2.01E-02 


0.0151 


95.70% 




100 


-2.78E-02 


0.0240 


92.10% 


-5.44E-02 


0.0200 


93.00% 




200 


6.24E-02 


0.0239 


94.30% 


-7.06E-03 


0.0209 


94.10% 


LN -0.1 


50 


-5.01E-02 


0.0346 


91.99% 


1.69E-02 


0.0354 


92.29% 




100 


7.77E-02 


0.0478 


89.21% 


5.27E-02 


0.0488 


89.38% 




200 


-1.38E-01 


0.0493 


91.98% 


-8.07E-04 


0.0464 


92.59% 


0.2 


50 


-5.86E-02 


0.0261 


91.82% 


-4.46E-02 


0.0250 


91.42% 




100 


7.04E-02 


0.0339 


90.16% 


7.59E-02 


0.0352 


89.39% 




200 


3.88E-02 


0.0340 


92.40% 


4.38E-02 


0.0345 


92.70% 


0.5 


50 


-5.39E-02 


0.0169 


94.43% 


-3.60E-02 


0.0172 


93.93% 




100 


-1.02E-01 


0.0241 


93.00% 


-8.00E-02 


0.0234 


93.20% 




200 


-4.62E-02 


0.0239 


94.40% 


-5.02E-02 


0.0243 


93.80% 



Norm denotes the normal distribution; LN denotes the lognormal distribution. 
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Table 2. Bias and RMSE of the proposed, parametric, and semiparametric methods 







Proposed Method 


Semiparametric Method 


Parametric Method 


p 


M(J) 


Bias 


RMSE 


Bias 


RMSE 


Bias 


RMSE 


Norm -0.1 


50 


-0.0140 


0.0329 


-0.0123 


0.0318 


-0.0131 


0.0326 




1 AA 

100 


-0.0126 


0.0251 


-0.0144 


0.0249 


-0.0138 


0.0246 




200 


-0.0136 


0.0202 


-0.0132 


0.0203 


-0.0135 


0.0198 


a i 

0.2 


50 


-0.0149 


0.0247 


-0.0155 


0.0440 


-0.0117 


0.0423 




100 


-0.0150 


0.0331 


-0.0139 


0.0327 


-0.0125 


0.0317 




1 A A 

200 


-0.0140 


0.0451 


-0.0147 


0.0262 


-0.0136 


0.0241 


A C 

0.5 


CA 

50 


-0.0133 


0.0455 


-0.0153 


0.0456 


-0.0168 


0.0446 




100 


-0.0132 


0.0252 


-0.0130 


0.0327 


-0.0151 


0.0330 




200 


-0.0132 


0.0333 


-0.0139 


0.0258 


-0.0121 


0.0239 


LN -0.1 


50 


-0.0152 


-0.0158 


-0.0122 


0.0360 


0.0689 


0.0779 




100 


-0.0131 


-0.0129 


-0.0120 


0.0265 


0.0758 


0.0814 




200 


-0.0131 


-0.0145 


-0.0127 


0.0203 


0.0799 


0.0833 


0.2 


50 


-0.0158 


0.0446 


-0.0139 


0.0499 


0.0706 


0.0817 




100 


-0.0120 


0.0232 


-0.0141 


0.0351 


0.0754 


0.0810 




200 


-0.0136 


0.0327 


-0.0129 


0.0249 


0.0807 


0.0846 


0.5 


50 


-0.0158 


0.0460 


-0.0156 


0.0498 


0.0705 


0.0838 




100 


-0.0129 


0.0255 


-0.0120 


0.0344 


0.0791 


0.0877 




200 


-0.0145 


0.0343 


-0.0134 


0.0256 


0.0826 


0.0884 



Norm denotes the normal distribution; LN denotes the lognormal distribution. 



Table 3. Simulated powers for comparing tests 



AUC 





Equal Weight 


Optimal Weight 


p 


M=J=50 


100 


50 100 


-0.1 


0.507 


0.741 


0.723 0.932 


0.2 


0.335 


0.541 


0.659 0.909 


0.5 


0.327 


0.538 


0.703 0.936 


pAUC 




Equal Weight 


Optimal Weight 




M=J=50 


100 


50 100 


-0.1 


0.156 


0.290 


0.316 0.599 


0.2 


0.141 


0.212 


0.280 0.584 


0.5 


0.133 


0.187 


0.266 0.643 



Table 4. Bias, RMSE and coverage for simulated correlated data 









AUC 






pAUC 






M(J) 


Bias (in %) 


RMSE 


Coverage 


Bias (in %) 


RMSE 


Coverage 


Norm 


50 


-0.1182 


1.0266 


97.40% 


0.0627 


0.0184 


97.40% 




100 


0.0302 


2.1682 


96.60% 


0.0931 


0.0128 


96.60% 




200 


0.0038 


1.5226 


95.80% 


0.0116 


0.0090 


96.00% 


LN 


50 


-0.0768 


0.0143 


97.10% 


0.0097 


0.0125 


97.10% 




100 


-0.1126 


0.0218 


96.20% 


0.0521 


0.0093 


96.80% 




200 


-0.0445 


0.0109 


94.90% 


0.0317 


0.0188 


95.00% 



Norm denotes the normal distribution; LN denotes the lognormal distribution. 
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